feature: langfuse thinking 及 文本edit的问题修复( #371); 省略 diff 以减少内存峰值#376
feature: langfuse thinking 及 文本edit的问题修复( #371); 省略 diff 以减少内存峰值#376claude-code-best merged 6 commits intomainfrom
Conversation
在 recordLLMObservation 中添加 thinking 配置(type/budgetTokens), 所有 provider(claude/gemini/openai)及 tokenEstimation、sideQuery 调用处同步传递 thinking 信息,便于 Langfuse 面板观察 thinking 使用情况。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Langfuse 追踪直接传递整个 thinking 对象(含 type 和 budget_tokens), Analytics 日志同步补充 thinkingBudgetTokens 字段,logAPIQuery 改为 接收 ThinkingConfig 类型参数。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Read 工具输出将 Tab 渲染为空格,用户复制后 Edit 工具无法匹配。 在 findActualString 中增加 Tab→空格规范化回退匹配,并精确映射回原始文件位置。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughEnhances file string matching with tab-to-space fuzzy normalization and tests, threads a Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as API Handler (Claude/Gemini/OpenAI)
participant Logger as logAPIQuery
participant Tracer as recordLLMObservation
participant Langfuse as Langfuse
Client->>API: send request (includes thinkingConfig)
API->>Logger: logAPIQuery(..., thinkingConfig)
Logger->>Tracer: recordLLMObservation(rootSpan, { provider, model, thinking: thinkingConfig })
Tracer->>Langfuse: send generation span metadata (includes thinking/budgetTokens)
Langfuse-->>Tracer: ack
Tracer-->>Logger: done
Logger-->>API: logged
API-->>Client: response (and possible stream fallback events with thinkingBudgetTokens)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/components/Messages.tsx (1)
819-847:⚠️ Potential issue | 🟡 MinorAdd
shouldCollapseDiffsto memo comparators or clarify intent.
shouldCollapseDiffsis not checked inareMessageRowPropsEqual(MessageRow.tsx:319) orareMessagePropsEqual(Message.tsx:507). When a new message is appended,shouldCollapseDiffsflips fromfalse→truefor the previously-latest tool_result row. However, both memoized comparators returntrue(skip re-render) for static/resolved messages, so the component never re-renders to collapse the diff despite the prop change.If the intent is "collapse only at first paint," add a clarifying code comment. If the intent is "actively transition older diffs to condensed as the conversation grows," add
shouldCollapseDiffsto at least one of the comparators (likelyareMessageRowPropsEqualwhen the message contains a tool_result block).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/components/Messages.tsx` around lines 819 - 847, The prop shouldCollapseDiffs is changing but not included in the memo comparators, so add it to the equality check (or document intent). Update areMessageRowPropsEqual (in MessageRow.tsx) to compare prev.shouldCollapseDiffs !== next.shouldCollapseDiffs and return false when it differs (especially for rows with tool_result content), or if the desired behavior is to only apply collapse on first render, add a clear comment near the shouldCollapseDiffs calculation in Messages.tsx and keep comparators unchanged; ensure areMessagePropsEqual (Message.tsx) is also updated if Message consumes shouldCollapseDiffs directly.
🧹 Nitpick comments (5)
src/components/Messages.tsx (1)
817-822: HoistDIFF_COLLAPSE_DISTANCEto module scope and clarify the comment.
DIFF_COLLAPSE_DISTANCEis a tuning knob declared insiderenderMessageRow(re-created on every row), and with the current value0the comment "beyond the latest N messages" reads as if multiple recent messages stay verbose — only the single last one does. Hoisting it next toMAX_MESSAGES_WITHOUT_VIRTUALIZATION/MESSAGE_CAP_STEPmakes it discoverable and tunable.♻️ Suggested refactor
const MAX_MESSAGES_WITHOUT_VIRTUALIZATION = 200 const MESSAGE_CAP_STEP = 50 + +// Tool-result diffs are collapsed for every message except the last +// DIFF_COLLAPSE_DISTANCE messages (verbose / ctrl+o overrides). Set to 0 +// so only the most recent tool_result keeps its full diff — older diffs +// drop to 'condensed' to reduce memory peak on large transcripts. +const DIFF_COLLAPSE_DISTANCE = 0- // Collapse diffs for messages beyond the latest N messages. - // verbose (ctrl+o) overrides and always shows full diffs. - const DIFF_COLLAPSE_DISTANCE = 0 const shouldCollapseDiffs = renderableMessages.length - 1 - index > DIFF_COLLAPSE_DISTANCE🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/components/Messages.tsx` around lines 817 - 822, Hoist the DIFF_COLLAPSE_DISTANCE constant out of renderMessageRow to module scope next to MAX_MESSAGES_WITHOUT_VIRTUALIZATION and MESSAGE_CAP_STEP so it isn't re-created per row and is easier to tune; update the inline comment where DIFF_COLLAPSE_DISTANCE is used to accurately describe behavior (e.g. "Collapse diffs for messages beyond the latest N messages; when N=0 only the most recent message remains expanded") and keep the existing usage of shouldCollapseDiffs = renderableMessages.length - 1 - index > DIFF_COLLAPSE_DISTANCE unchanged so the logic still references the hoisted constant.src/services/api/claude.ts (2)
2554-2556: Optional: extract the duplicatedthinkingBudgetTokensspread.The same conditional appears three times (disabled-fallback path, normal-fallback path, 404-fallback path). A small helper would tidy this up, but it's purely cosmetic.
const thinkingAnalyticsFields = thinkingConfig.type === 'enabled' ? { thinkingBudgetTokens: thinkingConfig.budgetTokens } : {}…then
...thinkingAnalyticsFieldsat the three sites.Also applies to: 2589-2591, 2708-2710
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/services/api/claude.ts` around lines 2554 - 2556, Extract the duplicated conditional spread for thinkingBudgetTokens into a small helper object (e.g., create thinkingAnalyticsFields) so you only compute it once from thinkingConfig: set thinkingAnalyticsFields = thinkingConfig.type === 'enabled' ? { thinkingBudgetTokens: thinkingConfig.budgetTokens } : {} and then replace the three occurrences of ...(thinkingConfig.type === 'enabled' && { thinkingBudgetTokens: thinkingConfig.budgetTokens }) (the disabled-fallback, normal-fallback and 404-fallback sites) with ...thinkingAnalyticsFields; keep the symbol names thinkingConfig and thinkingBudgetTokens unchanged so the refactor is minimal and local.
1782-1793: Inconsistent field naming with peer providers — consider normalizing tobudgetTokensfor consistency.
langfuseThinkingcapturesqueryParams.thinkingdirectly, which usesbudget_tokens(snake_case from the Anthropic SDK). However,recordLLMObservationaccepts bothbudget_tokensandbudgetTokens(per its type signature and JSDoc atsrc/services/langfuse/tracing.tslines 81–90), so this will work without issue.That said, every other provider normalizes to camelCase before passing to
recordLLMObservation:
src/services/api/gemini/index.ts(lines 196-204):{ type, budgetTokens }src/utils/sideQuery.ts(lines 297-302):{ type, budgetTokens: thinkingConfig.budget_tokens }src/services/tokenEstimation.ts(line 357):{ type: 'enabled', budgetTokens }Aligning claude.ts to this pattern improves consistency and avoids relying on the permissive dual-format acceptance.
Optional: Normalize to match peer providers
- let langfuseThinking: BetaMessageStreamParams['thinking'] | undefined + let langfuseThinking: + | { type: 'enabled'; budgetTokens: number } + | { type: 'adaptive' } + | undefined { const queryParams = paramsFromContext({ model: options.model, thinkingConfig, }) const logMessagesLength = queryParams.messages.length const logBetas = useBetas ? (queryParams.betas ?? []) : [] const logEffortValue = queryParams.output_config?.effort - if (queryParams.thinking && queryParams.thinking.type !== 'disabled') { - langfuseThinking = queryParams.thinking - } + if (queryParams.thinking?.type === 'enabled') { + langfuseThinking = { + type: 'enabled', + budgetTokens: queryParams.thinking.budget_tokens, + } + } else if (queryParams.thinking?.type === 'adaptive') { + langfuseThinking = { type: 'adaptive' } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/services/api/claude.ts` around lines 1782 - 1793, langfuseThinking is set directly from queryParams.thinking which uses snake_case (budget_tokens), but other providers normalize to camelCase; update the block that assigns langfuseThinking (in the function that calls paramsFromContext and uses thinkingConfig) to map thinking.budget_tokens to thinking.budgetTokens (i.e., produce { type, budgetTokens } or equivalent) before passing into recordLLMObservation so the Claude path matches the camelCase shape used by gemini/index.ts, sideQuery.ts and tokenEstimation.ts and avoids relying on dual-format acceptance.packages/builtin-tools/src/tools/FileEditTool/utils.ts (1)
121-128: Combined branch re-normalizes whitespace over already-quote-normalized strings — harmless but redundant.Lines 122-123 call
normalizeWhitespace(normalizedFile)andnormalizeWhitespace(normalizedSearch), where both inputs were produced earlier (lines 100-101). Quote normalization doesn't introduce or remove tabs, socombinedFile === wsNormalizedFilewhenever no curly quotes exist and is a strict superset transformation otherwise. You can compute each normalization once and reuse, e.g.const combinedFile = normalizeQuotes(wsNormalizedFile)(and likewise for the search), avoiding one extra full-string traversal per call. Minor on small files; matters for very large ones where this is a hot path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts` around lines 121 - 128, The combined-branch is redundantly calling normalizeWhitespace(normalizedFile) and normalizeWhitespace(normalizedSearch) where normalizedFile/normalizedSearch were produced earlier; instead reuse the already-whitespace-normalized values (wsNormalizedFile, wsNormalizedSearch) and apply quote normalization once: replace combinedFile = normalizeWhitespace(normalizedFile) with combinedFile = normalizeQuotes(wsNormalizedFile) and combinedSearch = normalizeQuotes(wsNormalizedSearch), keeping the subsequent combinedIndex and mapNormalizedMatchBackToFile usage unchanged.packages/builtin-tools/src/tools/FileEditTool/__tests__/utils.test.ts (1)
109-187: Coverage gap: ratio-fallback edge case inmapNormalizedMatchBackToFileis not exercised.Every test where the match extends to the end of
fileContenthappens to also match the entire file starting at offset 0, which makes theorigEnd === -1ratio-heuristic inmapNormalizedMatchBackToFileresolve to the correct value by coincidence (ratio * normalizedLength === fileContent.length). A case like a partial trailing match in a file whose tab density is non-uniform (e.g."\t\t\t\t\t\t\t\t\t\there"searched with" here") would expose the truncation bug noted onutils.ts. Consider adding such a case once the underlying mapping is hardened.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/builtin-tools/src/tools/FileEditTool/__tests__/utils.test.ts` around lines 109 - 187, Add a unit test that reproduces the ratio-fallback edge-case in mapNormalizedMatchBackToFile by using a fileContent with highly non-uniform tab density (e.g. many leading tabs then text) and a search string with spaces for the same trailing text (example: fileContent = "\t\t\t\t\t\t\t\t\t\there" and searchSpaces = " here"); use findActualString to locate the match and assert the result is not null, that fileContent.includes(result!), and that the returned substring equals the actual trailing slice of fileContent (so the test fails if origEnd === -1 mapping truncates/miscomputes the end). Ensure the new test is named clearly (e.g. "handles ratio-fallback for trailing partial matches") and placed in the same utils.test.ts block covering tab/space normalization to exercise mapNormalizedMatchBackToFile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts`:
- Around line 66-74: The JSDoc for normalizeWhitespace is inaccurate: the
function currently performs a global tab expansion, not leading-whitespace
collapsing. Update the comment above normalizeWhitespace to state that it
converts all tab characters to four spaces (global replacement) for consistent
fuzzy matching between Read tool output and file contents, and remove the
mention of collapsing leading whitespace so it matches the implementation.
- Around line 143-193: mapNormalizedMatchBackToFile can under-estimate origEnd
when the loop exits because origPos reached fileContent.length (match at EOF),
causing the ratio fallback to produce a truncated substring; fix by detecting
when the loop terminated due to reaching the end of file (origPos >=
fileContent.length) and, in that case, set origEnd = fileContent.length (or
clamp origEnd to fileContent.length) before applying the ratio fallback so
matches that extend to EOF return the full tail; update the logic inside
mapNormalizedMatchBackToFile to explicitly snap/clip origEnd to
fileContent.length when the file was exhausted.
---
Outside diff comments:
In `@src/components/Messages.tsx`:
- Around line 819-847: The prop shouldCollapseDiffs is changing but not included
in the memo comparators, so add it to the equality check (or document intent).
Update areMessageRowPropsEqual (in MessageRow.tsx) to compare
prev.shouldCollapseDiffs !== next.shouldCollapseDiffs and return false when it
differs (especially for rows with tool_result content), or if the desired
behavior is to only apply collapse on first render, add a clear comment near the
shouldCollapseDiffs calculation in Messages.tsx and keep comparators unchanged;
ensure areMessagePropsEqual (Message.tsx) is also updated if Message consumes
shouldCollapseDiffs directly.
---
Nitpick comments:
In `@packages/builtin-tools/src/tools/FileEditTool/__tests__/utils.test.ts`:
- Around line 109-187: Add a unit test that reproduces the ratio-fallback
edge-case in mapNormalizedMatchBackToFile by using a fileContent with highly
non-uniform tab density (e.g. many leading tabs then text) and a search string
with spaces for the same trailing text (example: fileContent =
"\t\t\t\t\t\t\t\t\t\there" and searchSpaces = " here"); use findActualString
to locate the match and assert the result is not null, that
fileContent.includes(result!), and that the returned substring equals the actual
trailing slice of fileContent (so the test fails if origEnd === -1 mapping
truncates/miscomputes the end). Ensure the new test is named clearly (e.g.
"handles ratio-fallback for trailing partial matches") and placed in the same
utils.test.ts block covering tab/space normalization to exercise
mapNormalizedMatchBackToFile.
In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts`:
- Around line 121-128: The combined-branch is redundantly calling
normalizeWhitespace(normalizedFile) and normalizeWhitespace(normalizedSearch)
where normalizedFile/normalizedSearch were produced earlier; instead reuse the
already-whitespace-normalized values (wsNormalizedFile, wsNormalizedSearch) and
apply quote normalization once: replace combinedFile =
normalizeWhitespace(normalizedFile) with combinedFile =
normalizeQuotes(wsNormalizedFile) and combinedSearch =
normalizeQuotes(wsNormalizedSearch), keeping the subsequent combinedIndex and
mapNormalizedMatchBackToFile usage unchanged.
In `@src/components/Messages.tsx`:
- Around line 817-822: Hoist the DIFF_COLLAPSE_DISTANCE constant out of
renderMessageRow to module scope next to MAX_MESSAGES_WITHOUT_VIRTUALIZATION and
MESSAGE_CAP_STEP so it isn't re-created per row and is easier to tune; update
the inline comment where DIFF_COLLAPSE_DISTANCE is used to accurately describe
behavior (e.g. "Collapse diffs for messages beyond the latest N messages; when
N=0 only the most recent message remains expanded") and keep the existing usage
of shouldCollapseDiffs = renderableMessages.length - 1 - index >
DIFF_COLLAPSE_DISTANCE unchanged so the logic still references the hoisted
constant.
In `@src/services/api/claude.ts`:
- Around line 2554-2556: Extract the duplicated conditional spread for
thinkingBudgetTokens into a small helper object (e.g., create
thinkingAnalyticsFields) so you only compute it once from thinkingConfig: set
thinkingAnalyticsFields = thinkingConfig.type === 'enabled' ? {
thinkingBudgetTokens: thinkingConfig.budgetTokens } : {} and then replace the
three occurrences of ...(thinkingConfig.type === 'enabled' && {
thinkingBudgetTokens: thinkingConfig.budgetTokens }) (the disabled-fallback,
normal-fallback and 404-fallback sites) with ...thinkingAnalyticsFields; keep
the symbol names thinkingConfig and thinkingBudgetTokens unchanged so the
refactor is minimal and local.
- Around line 1782-1793: langfuseThinking is set directly from
queryParams.thinking which uses snake_case (budget_tokens), but other providers
normalize to camelCase; update the block that assigns langfuseThinking (in the
function that calls paramsFromContext and uses thinkingConfig) to map
thinking.budget_tokens to thinking.budgetTokens (i.e., produce { type,
budgetTokens } or equivalent) before passing into recordLLMObservation so the
Claude path matches the camelCase shape used by gemini/index.ts, sideQuery.ts
and tokenEstimation.ts and avoids relying on dual-format acceptance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9c26788a-29fc-439c-b7e5-9451565e1b0b
📒 Files selected for processing (15)
packages/builtin-tools/src/tools/FileEditTool/__tests__/utils.test.tspackages/builtin-tools/src/tools/FileEditTool/utils.tssrc/components/Message.tsxsrc/components/MessageRow.tsxsrc/components/Messages.tsxsrc/components/messages/UserToolResultMessage/UserToolResultMessage.tsxsrc/components/messages/UserToolResultMessage/UserToolSuccessMessage.tsxsrc/main.tsxsrc/services/api/claude.tssrc/services/api/gemini/index.tssrc/services/api/logging.tssrc/services/api/openai/index.tssrc/services/langfuse/tracing.tssrc/services/tokenEstimation.tssrc/utils/sideQuery.ts
| /** | ||
| * Normalizes whitespace for fuzzy matching by converting tabs to spaces | ||
| * and collapsing leading whitespace on each line to a canonical form. | ||
| * This handles the case where Read tool output renders tabs as spaces, | ||
| * so users copy spaces from the output but the file actually has tabs. | ||
| */ | ||
| function normalizeWhitespace(str: string): string { | ||
| return str.replace(/\t/g, ' ') | ||
| } |
There was a problem hiding this comment.
JSDoc doesn't match the implementation.
The doc says it "collapses leading whitespace on each line to a canonical form", but str.replace(/\t/g, ' ') replaces every tab in the string regardless of position (interior tabs included). Either tighten the implementation to leading-only whitespace or update the comment to describe what actually happens (global tab→4-space expansion). Note the global behavior is also what the rest of the cascade relies on, so the comment is the simpler fix.
📝 Suggested doc tweak
/**
- * Normalizes whitespace for fuzzy matching by converting tabs to spaces
- * and collapsing leading whitespace on each line to a canonical form.
- * This handles the case where Read tool output renders tabs as spaces,
- * so users copy spaces from the output but the file actually has tabs.
+ * Normalizes whitespace for fuzzy matching by globally expanding every
+ * tab character to 4 spaces. This handles the case where Read tool output
+ * renders tabs as spaces, so users copy spaces from the output but the
+ * file actually has tabs. Tab width is fixed at 4 — files using a
+ * different tab width will not match via this fallback.
*/📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Normalizes whitespace for fuzzy matching by converting tabs to spaces | |
| * and collapsing leading whitespace on each line to a canonical form. | |
| * This handles the case where Read tool output renders tabs as spaces, | |
| * so users copy spaces from the output but the file actually has tabs. | |
| */ | |
| function normalizeWhitespace(str: string): string { | |
| return str.replace(/\t/g, ' ') | |
| } | |
| /** | |
| * Normalizes whitespace for fuzzy matching by globally expanding every | |
| * tab character to 4 spaces. This handles the case where Read tool output | |
| * renders tabs as spaces, so users copy spaces from the output but the | |
| * file actually has tabs. Tab width is fixed at 4 — files using a | |
| * different tab width will not match via this fallback. | |
| */ | |
| function normalizeWhitespace(str: string): string { | |
| return str.replace(/\t/g, ' ') | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts` around lines 66 - 74,
The JSDoc for normalizeWhitespace is inaccurate: the function currently performs
a global tab expansion, not leading-whitespace collapsing. Update the comment
above normalizeWhitespace to state that it converts all tab characters to four
spaces (global replacement) for consistent fuzzy matching between Read tool
output and file contents, and remove the mention of collapsing leading
whitespace so it matches the implementation.
| function mapNormalizedMatchBackToFile( | ||
| fileContent: string, | ||
| normalizedFile: string, | ||
| normalizedStart: number, | ||
| normalizedLength: number, | ||
| ): string { | ||
| // Build a sparse mapping from normalized position → original position. | ||
| // We only need to map the range [normalizedStart, normalizedStart + normalizedLength]. | ||
| let normPos = 0 | ||
| let origPos = 0 | ||
| let origStart = -1 | ||
| let origEnd = -1 | ||
|
|
||
| while (origPos < fileContent.length && normPos <= normalizedStart + normalizedLength) { | ||
| if (normPos === normalizedStart) { | ||
| origStart = origPos | ||
| } | ||
| if (normPos === normalizedStart + normalizedLength) { | ||
| origEnd = origPos | ||
| break | ||
| } | ||
|
|
||
| const origChar = fileContent[origPos]! | ||
| if (origChar === '\t') { | ||
| // Tab expands to 4 spaces in normalized version | ||
| const nextNormPos = normPos + 4 | ||
| // If normalizedStart falls within this expanded tab, snap to origPos | ||
| if (normPos < normalizedStart && nextNormPos > normalizedStart && origStart === -1) { | ||
| origStart = origPos | ||
| } | ||
| if (normPos < normalizedStart + normalizedLength && nextNormPos > normalizedStart + normalizedLength && origEnd === -1) { | ||
| origEnd = origPos + 1 | ||
| } | ||
| normPos = nextNormPos | ||
| origPos++ | ||
| } else { | ||
| normPos++ | ||
| origPos++ | ||
| } | ||
| } | ||
|
|
||
| // Fallback: if we couldn't map precisely, use character-count heuristic | ||
| if (origStart === -1) origStart = 0 | ||
| if (origEnd === -1) { | ||
| // Approximate: use the ratio of original to normalized length | ||
| const ratio = fileContent.length / normalizedFile.length | ||
| origEnd = Math.round(origStart + normalizedLength * ratio) | ||
| } | ||
|
|
||
| return fileContent.substring(origStart, origEnd) | ||
| } |
There was a problem hiding this comment.
mapNormalizedMatchBackToFile can return a truncated substring when the match extends to EOF.
The loop terminates as soon as origPos >= fileContent.length, which occurs naturally when the match ends at the end of fileContent. In that case origEnd stays -1 and the ratio fallback (fileContent.length / normalizedFile.length) is used — but that ratio is only valid if tab density is uniform across the whole file. With non-uniform tabs, the fallback under-estimates origEnd and silently returns a shorter substring than the actual match.
Counter-example:
fileContent = "\t\t\t\t\t\t\t\t\t\there"(10 leading tabs, length 14)normalizedFile.length = 44- search =
" here"→ match at normalized index 36, length 8 →origStart = 9 - ratio = 14/44 ≈ 0.318 →
origEnd = round(9 + 8 * 0.318) = 12 - returned:
"\the"instead of"\there"
Because the callers in FileEditTool.ts (lines 303-315 and 458-467) and UI.tsx (lines 296-304) feed this result directly into file.split(actualOldString) for occurrence counting and into getPatchForEdit without any validation, a truncated return can silently miscount matches and produce the wrong replacement.
🔧 Proposed fix — explicitly snap origEnd to end-of-file when the loop exhausted fileContent
while (origPos < fileContent.length && normPos <= normalizedStart + normalizedLength) {
// ... existing body ...
}
- // Fallback: if we couldn't map precisely, use character-count heuristic
- if (origStart === -1) origStart = 0
- if (origEnd === -1) {
- // Approximate: use the ratio of original to normalized length
- const ratio = fileContent.length / normalizedFile.length
- origEnd = Math.round(origStart + normalizedLength * ratio)
- }
+ // If the loop exhausted fileContent while the match still ran to (or past) its end,
+ // the match extends to EOF and origEnd should be the end of fileContent.
+ if (origEnd === -1 && normPos >= normalizedStart + normalizedLength) {
+ origEnd = fileContent.length
+ }
+ // Last-resort fallback (should not normally be reached given the cascade above).
+ if (origStart === -1) origStart = 0
+ if (origEnd === -1) origEnd = fileContent.length🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts` around lines 143 -
193, mapNormalizedMatchBackToFile can under-estimate origEnd when the loop exits
because origPos reached fileContent.length (match at EOF), causing the ratio
fallback to produce a truncated substring; fix by detecting when the loop
terminated due to reaching the end of file (origPos >= fileContent.length) and,
in that case, set origEnd = fileContent.length (or clamp origEnd to
fileContent.length) before applying the ratio fallback so matches that extend to
EOF return the full tail; update the logic inside mapNormalizedMatchBackToFile
to explicitly snap/clip origEnd to fileContent.length when the file was
exhausted.
Summary by CodeRabbit
New Features
Tests
Documentation